Multi-genome comparisons reveal gain-and-loss evolution of anti-Mullerian hormone receptor type 2 as a candidate master sex-determining gene in Percidae

Background The Percidae family comprises many fish species of major importance for aquaculture and fisheries. Based on three new chromosome-scale assemblies in Perca fluviatilis, Perca schrenkii, and Sander vitreus along with additional percid fish reference genomes, we provide an evolutionary and comparative genomic analysis of their sex-determination systems. Results We explored the fate of a duplicated anti-Mullerian hormone receptor type-2 gene (amhr2bY), previously suggested to be the master sex-determining (MSD) gene in P. flavescens. Phylogenetically related and structurally similar amhr2 duplicates (amhr2b) were found in P. schrenkii and Sander lucioperca, potentially dating this duplication event to their last common ancestor around 19–27 Mya. In P. fluviatilis and S. vitreus, this amhr2b duplicate has been likely lost while it was subject to amplification in S. lucioperca. Analyses of the amhr2b locus in P. schrenkii suggest that this duplication could be also male-specific as it is in P. flavescens. In P. fluviatilis, a relatively small (100 kb) non-recombinant sex-determining region (SDR) was characterized on chromosome 18 using population-genomics approaches. This SDR is characterized by many male-specific single-nucleotide variations (SNVs) and no large duplication/insertion event, suggesting that P. fluviatilis has a male heterogametic sex-determination system (XX/XY), generated by allelic diversification. This SDR contains six annotated genes, including three (c18h1orf198, hsdl1, tbc1d32) with higher expression in the testis than in the ovary. Conclusions Together, our results provide a new example of the highly dynamic sex chromosome turnover in teleosts and provide new genomic resources for Percidae, including sex-genotyping tools for all three known Perca species. Supplementary Information The online version contains supplementary material available at 10.1186/s12915-024-01935-9.


Background
The percid family (Percidae, Rafinesque) encompasses a large number (over 250) of diverse ecologically and economically important fish species, assigned to 11 genera [1].Two genera, Perca and Sander, are found across both Eurasia and North America, with separate species native to each continent (Eurasia: Perca fluviatilis/Sander lucioperca; North America: Perca flavescens/Sander vitreus).Percids are classically described as typical freshwater species of the Northern hemisphere, even if some species can be regularly found in brackish waters (e.g., Sander lucioperca, Perca fluviatilis).In the context of declining fisheries over the past few decades, but also due to their high value and good market acceptance, 4 percid species-Perca flavescens (yellow perch) and Sander vitreus (walleye) in North America and P. fluviatilis (European perch) and S. lucioperca (zander) in Eurasia are particularly promising for aquaculture.Rearing these fish in recirculation aquaculture systems allows for a control of reproduction and a year-round production of stocking fish [2,3].Although year-round production represents an important competitive goal, current production targets premium markets and an up-scaling of production faces several bottlenecks [4].
Among these bottlenecks is better control of the sex of developing individuals because in both Perca and Sander genera, females grow faster than males [5,6].Due to faster female growth (up to 25-50% in Perca, 10% in Sander), all-female stocks are highly desirable.In Perca fluviatilis, sex determination has been assumed to be male heterogametic (XX/XY) based on gynogenesis or hormonal treatment experiments [7,8].These methodologies also produced genetic female but phenotypic male individuals (neomales) that can be used to produce allfemale stocks by crossing normal XX females with these chromosomally XX neomales.This approach would, however, greatly benefit from a reliable sexing method allowing the identification of genetic sex early during development to select rare genetically XX neomales as future breeders in aquaculture.In P. flavescens, an XX female/XY male heterogametic genetic sex-determination system has been also recently uncovered, with duplication/insertion of an anti-Mullerian hormone receptor type 2 (amhr2) gene as a potential master sex-determining gene [9].
Genes encoding many members of the transforming growth factor beta (TGF-β) gene family, including anti-Mullerian hormone (amh) and anti-Mullerian hormone receptor type-2 (amhr2), have repeatedly and independently evolved as MSD genes in vertebrates [10].For instance, amh has been characterized or suspected to be the MSD gene in pikes [11,12], Nile tilapia [13], lumpfish [14], Sebastes rockfish [15], lingcod [16], and Patagonian pejerrey [17].The cognate receptor gene of Amh, amhr2, has also been found as a potential MSD gene in Pangasiidae [18], Takifugu [19], Ayu [20], common seadragon and alligator pipefish [21], and in yellow perch [9].The repeated and independent recruitment of TGF-β receptors, including Amhr2, in teleost fish sex determination is even more puzzling as many of these MSD genes, encoding a TGFβ receptor, share a similar N-terminal truncation [9,18,21], supporting their evolution towards a ligand-independent mechanism of action [18].Therefore, the extent of evolutionary conservation of the Y-linked amhr2bY gene found in yellow perch in closely related species (genus Perca and Sander) is an important question with implications for better sex control in these aquaculture species and for understanding the evolution of sex linkage and protein structure.
Regarding genomics of Percidae, two long-read reference quality genome assemblies have recently been published for P. flavescens and S. lucioperca [9,22].While for P. fluviatilis and S. vitreus, only draft genomes, generated from short-read sequencing, have been available [23].Here, we provide three new long-read chromosomescale genome assemblies for P. fluviatilis, P. schrenkii, and Sander vitreus and thus complete genomic resources for the economically most important species of Percidae.These data enabled us to develop polymerase chain reaction (PCR) genotyping-assays for sexing of all three Perca species and shed light on gene gain-and-loss in the evolution of an old MSD gene in Percidae.

Genome assemblies of Perca fluviatilis, Perca schrenkii, and Sander vitreus
The genome of P. fluviatilis was sequenced to high coverage using Oxford Nanopore long-read sequencing (estimated coverage, 67-fold/N50 read length, 11.9 kbp), and Hi-C data was generated to allow for chromosomelevel assembly (coverage, 52-fold/alignable pairs, 89.1%/ Hi-C map, see Additional file 1: Fig. S1).The final assembly yielded a highly complete reference genome (99.0% of sequence assigned to 24 chromosomes (N50 length, 39.6 Mbp) and highly continuous contigs (N50 length: 4.1 Mbp)).Compared to a previously published genome assembly of P. fluviatilis, obtained from "linked-shortreads" (10X Genomics), these numbers represent a 316fold improvement of contig continuity and a 6.3-fold increase of scaffold continuity.The better continuity resulted in an increased percentage of predictable genes (BUSCO results below).Genome assembly statistics are also highly congruent with the previously published reference quality P. flavescens genome, except for obvious size differences as the P. fluviatilis assembly is about 8.1% larger than the P. flavescens assembly and represents the largest genome known in the genus Perca (Table 1).
The genome of P. schrenkii was assembled by a hybrid assembly method, which was highly efficient regarding long-read sequencing coverage and read length needed (here only 30-fold/N50 read length, 4.95 kb).De novo assembled contigs from short reads were combined with long reads and scaffolded using our CSA pipeline [25], with the P. flavescens as the closest reference genome (Fig. 1: divergence time about 7.1 Mya) for genome comparison and inferring chromosomallevel sequences.Using this approach, we were able to assemble the genome of P. schrenkii to similar quality as those obtained for P. flavescens and P. fluviatilis (94.7% assigned to 24 chromosomes/contig N50 length, 3.2 Mbp; Table 1).The genome assembly size of P. schrenkii was in between the other two Perca sp.genomes (877 Mbp < 908 Mbp < 951 Mbp).
The genome of Sander vitreus was assembled from long reads (coverage, 12-fold/N50 read length, 10 kb) and short reads similar to procedures for P. schrenkii, but we used two reference genomes for chromosomal assembly.First, contigs of S. lucioperca (closest relative, divergence time 9.8 Mya) served to order the S. vitreus contigs (N50, 6.2 Mb), which improved scaffold N50 significantly (result.N50, 16.8 Mb), then these nearly chromosome-scale scaffolds were ordered according to the P. flavescens (div.time 19.1 Mya) Hi-C chromosomes (result.N50, 33.3 Mb/96.5% assigned to 24 chromosomes).This two-step approach resulted in more consistent results than just using the S. lucioperca chromosomal scaffolds, which were generated by genetic linkage mapping.We observed similar genome size differences as in Perca sp. between both Sander species.The S. vitreus genome assembly (791 Mb) is smaller than the one of S. lucioperca (901 Mb); thus, the North American Sander and Perca species tend to have smaller genome sizes than their Eurasian relatives (Table 1).
De novo repeat analysis showed that 60.1% of the genome assembly size difference between Sander vitreus and S. lucioperca could be explained by repeat expansion/reduction.Similarly, for Perca flavescens and P. fluviatilis about 64.8% of the genome size differences were due to repeat expansion/reduction.In both species pairs, most repeat expansions/reductions were observed in repeat elements classified as "unknown." Regarding annotated repeat element classes, L2, DNA, Helitron, CMC-EnSpm, hAT, Rex-Babar, hAT-Charlie, and Piggy-Bac elements expanded the most in both Eurasian species (together adding roughly 20 Mbp to the genomes of S. lucioperca and P. fluviatilis), while a clear expansion of only a single repeat element, called RTE-BovB, was found in both North American species (adding about 3 and 7 Mbp of sequence to S. vitreus and P. flavescens, respectively; Additional file 2: Table S1).
The genome of P. fluviatilis was annotated by NCBI/ GNOMON, which included ample public RNAseq data and protein homology evidence.For P. schrenkii and S. vitreus, we transferred the NCBI/GNOMON annotations of P. flavescens and S. lucioperca, respectively.BUSCO analysis (Table 2) revealed values larger than 95.9% for complete BUSCOs (category "C:") for all annotations.The comparative annotation approach resulted only in small losses (category "M:") of a few BUSCO genes in the range of 0.4-1.1%.In this regard, the S. vitreus assembly performed better than the P. schrenkii assembly, possibly due to the higher N50 read length of the underlying longread data.

Percomorpha phylogenomics and divergence time estimation
To calculate the phylogenetic tree of 36 Percomorpha species and their divergence times, we aligned whole genomes and extracted the non-coding alignments (Fig. 1).The use of non-coding sequences is preferable to calculate difficult-to-resolve phylogenetic trees that occur after massive radiations [26][27][28].Our multiple alignment consisted of 6,594,104 nt residues (2,256,299 distinct patterns; 1,652,510 parsimony-informative; 1,136,496 singleton sites; 3,805,098 constant sites) and resulted in a highly supported tree (raxml-ng and IQtree2 topologies were identical; all IQtree2's SH-aLRT and ultrafast bootstrapping (UFBS) values were 100).According to the divergence time analysis, most Percomorpha orders emerged after

The fate of amhr2 genes during evolution of Perca and Sander species
In the genome of P. flavescens, two amhr2 paralogs were previously described, i.e., an autosomal gene, amhr2a, present in both males and females on chromosome 04 (Chr04), and a male-specific duplication on the Y-chromosome (Chr09), amhr2bY [9].A similar amhr2 gene duplication was also found in our male P. schrenkii assembly, and sequence homologies and conserved synteny analyses show that these two P. schrenkii amhr2 genes are orthologs of P. flavescens amhr2a and amhr2bY, respectively (Fig. 2A and AB).Genotyping of one male and one female also suggests that the amhr2bY gene could be male-specific in P. schrenkii (Additional file 3: Fig. S2, Table 3), as described in P. Fig. 2 Evolution of amhr2 genes in Percidae.A Orthologs of amhr2a were identified in genome assemblies, and their gene tree is consistent with the species tree, which was found by the phylogenomics approach (see Fig. 1).The amhr2b duplicates were only found in the genome assemblies of P. flavescens (single copy, male-specific), P. schrenkii (single copy, potentially male-specific), and S. lucioperca (three copies, no clear sex linkage), and they clustered together.Thus, amhr2b stems from a gene duplication event that occurred at the origin of Percidae (19-27 Mya), and the absence of amhr2b in P. fluviatilis and S. vitreus suggests a secondary loss event in these species.This tree was calculated on codon position 1 and 2 alignments and achieved the best bootstrap support (red numbers: SH-aLRT/UFBS support values) for the split of the amhr2a and amhr2b clades.Trees based on complete CDS, CDS + introns, and amino acid sequences resulted in the same topology albeit with lower bootstrap support for some splits (Additional file 10: Fig. S9).Numbers after species names depict the coordinates of the respective amhr2 genes in the genome assemblies.B, C Conserved synteny around the amhr2a (B) and amhr2b (C) loci in some Percidae species.These multiple duplications (in S. lucioperca) or the loss of the amhr2b genes (in P. fluviatilis and S. vitreus) emphasize that amhr2b may be considered a "jumping" gene locus, which is also supported by conserved synteny analysis flavescens [9].This potential sex linkage is also supported by a half coverage of reads in the genomic region containing the amhr2bY locus in our male P. schrenki genome assembly (Additional file 4: Fig. S3), in agreement with the hemizygosity of a male-specific region on the Y in species with a XX/XY sex-determination system.Alignment of the P. schrenkii amhr2bY ortholog shows that its coding sequence (CDS) shares 98% identity with the P. flavescens amhr2bY CDS and 95.7% identity at the protein level the P. flavescens Amhr2bY.
As in P. flavescens [9], the P. schrenkii amhr2bY gene encodes a N-terminal-truncated type II receptor protein that lacks most of the cysteine-rich extracellular part of the receptor, which is crucially involved in ligand-binding specificity [29] (Fig. 3).In contrast, in our P. fluviatilis male genome assembly, only one copy of amhr2 could be identified, and sequence homologies and conserved synteny analysis (Fig. 2A, B) show that this amhr2 gene is the ortholog of P. flavescens and P. schrenkii amhr2a.In addition, PCR with primers designed to amplify both amhr2a and amhr2bY from P. flavescens and P. schrenkii did not show any sex differences in P. fluviatilis.Altogether, these results support the absence of an amhr2b gene in P. fluviatilis.
In the Sander lucioperca male genome assembly, four copies of amhr2 were detected, and like in P. flavescens and P. schrenkii, the deduced duplicated S. lucioperca Amhr2b proteins have the same structural modifications with a deletion of part of their N-terminal region extracellular domain (see Fig. 3).Using the same primers as those used for the amplification of both amhr2a and amhr2bY in P. flavescens and P. schrenkii, PCR genotyping on males and females of S. lucioperca produced complex amplification patterns with multiple  bands and no visible association with sex.In the publicly available genome assembly of Sander vitreus, sequence homologies and/or conserved synteny analysis (Fig. 2A, B) allowed the identification of a single autosomal amhr2a gene.
A phylogenetic analysis of the sequences with similarity to amhr2 in Perca and Sander (Fig. 2A, B) shows that the different amhr2 genes likely originated from a gene duplication event that happened in the branch leading to the LCA of these species, dated around 19-27 Mya.This hypothesis of a common origin of this amhr2 duplication in the last common ancestor of these species is also corroborated by the fact that these amhr2b duplications all share the same deletion in their extracellular N-terminal domain (see Fig. 3).The most parsimonious hypothesis is then a common origin of the duplication of the amhr2b gene that has been conserved in P. schrenkii and P. flavescens, lost in P. fluviatilis and S. vitreus, and amplified in S. lucioperca.

Evolution of sex determination in P. fluviatilis
Because P. fluviatilis sex determination does not rely on an amhr2 duplication like what has been found in P. flavescens [9] and P. schrenkii (this study), we used genome-wide approaches to better characterize its sex-determination system.Restriction site-associated DNA sequencing (RAD-Seq) analysis on 35 males and 35 females of P. fluviatilis, carried out with a minimum read depth of one, allowed the characterization of a single significant sex marker (Additional file 5: Fig. S4).This 94 bp RAD sequence matches (Blast Identities, 93/95%) a portion of P. fluviatilis chromosome 18 (GENO_Pfluv_1.0,Chr18: 27656212-27656305).This RAD-Seq analysis suggests that Chr18 could be the P. fluviatilis sex chromosome and that its sex-determining region could be very small because we only detected a single significant sex-linked RAD sequence.To get a better characterization of the P. fluviatilis sex chromosome and sex-determining region, we then used pool sequencing (Pool-Seq) to re-analyze the same samples used for RAD-Seq by pooling together DNA from the males in one pool and DNA from females in a second pool.Using these Pool-Seq datasets, we identified a small 100-kb region on P. fluviatilis Chr18 with a high density of male-specific SNVs (Fig. 4), confirming the RAD-Seq hypothesis that Chr18 is the sex chromosome in that species.No male-specific duplication/insertion event was found in this sex-determining region on Chr18, which contains six annotated genes (Fig. 4D).These genes encode a protein of unknown function (C18h1orf198), three gap-junction proteins (Cx32.2,Gja13.2, and Cx32.7), a protein annotated as inactive hydroxysteroid dehydrogenase-like protein (Hsdl1), and a protein known as protein broad-minded or Tcb1 domain family member 32 protein (Tbc1d32).Three of these six annotated genes, i.e., c18h1orf198, hsdl1 and tbc1d32, display a higher expression in testis than in the ovary (Additional file 6: Fig. S5).
To provide a better support for the sex linkage of the male-specific variants found within this sex-determining region on Chr18, we designed different types of genotyping assays (classical PCR and KASpar PCR) that have been applied to different P. fluviatilis individuals which were phenotypically sexed with confidence.A classical PCR assay was first developed based on the detection of a Y-allele-specific 27-bp deletion in the third intron of the P. fluviatilis hsdl1 gene (Additional file 7: Fig. S6, Table 3), and this assay successfully identified all males of a Lake Mueggelsee from Germany (10 males and 9 females; p-value of association with sex = 9.667e−05).In addition, KASpar allele-specific PCR assays were developed based on 7 single nucleotide sex-specific variants, located at different positions within the Chr18 sex-determining region of P. fluviatilis (Fig. 4D).Tests of 48 males and 48 females showed that of seven KASpar allele-specific PCR-assays, 5 resulted in a high proportion of correctly genotyped individuals with males being heterozygote and females being homozygote (> 95%).Two of the targeted SNVs (SNV1 and SNV3) that displayed 100% sex linkage accuracy (Additional file 8: Fig. S7, Tables 3 and 4).Sex linkage of SNV1 was then checked on a wild-type population from Kortowskie Lake in Poland for which the association of male phenotype and SNV1 heterozygosity was also complete (17 males and 20 females; p-value = 8.83e−09, Table 3).

Discussion
The Percidae family is represented by 239 species and 11 genera.The genera Perca and Sander are especially important for aquaculture and fisheries.By providing new genome sequence assemblies for Perca fluviatilis, P. schrenckii, and Sander vitreus, we provide for the first-time access to all economically important species of the Percidae at the DNA-level.Assembly statistics for these three new genome sequence assemblies are reference grade with N50 continuity in the megabase range and chromosomal length scaffolds, obtained either by Hi-C scaffolding or by conserved synteny analysis involving the closest relatives.Importantly, completeness on the gene-level is significantly higher than in a short read-based draft genome for P. fluviatilis, published earlier [30], allowing much stronger conclusions to be drawn regarding the presence or absence of possible sex-determining genes.

The origin of Percidae
A phylogenomic approach using aligned non-coding sequences of 36 genomes resulted in a highly supported tree showing a rapid radiation of fish families.It has recently been shown that phylogenomics based on non-coding sequences may be more reliable at resolving difficult-toresolve radiations in species trees (i.e., in Aves).According to our time calibration, many taxonomic orders related to the Percidae emerged shortly after the Cretaceous-Paleogene (K-Pg) mass extinction event, about 66 Mya, and gave rise to Percidae about 59 Mya.This is in contrast to many older studies, which have, for example, dated the split Similar patterns in rapid radiations have been observed in the avian tree of life and have likewise been attributed to the K-Pg mass extinction [27,28].In context, it has been argued that the so-called Lilliput Effect, which describes the selection in favor of species with small body sizes and fast generation times after mass extinction events, can lead to an increase in substitution rates and results in overestimations of node-ages for molecular clocks [31].

The evolution of sex determination in Perca and Sander species Evolution and turnover of Amhr2
Sex-determination systems with MSD evolved from duplications of the amh [10][11][12][13][14][15][16][17], or amhr2 genes have now been characterized in many fish species, all with a male-heterogametic system (XX/XY).In addition, the fact that Amh in monotremes [32], or Amhr2 in some lizards [33], are Y-linked also makes them strong MSD gene candidates in other vertebrate species.In Percidae, sex determination has only been explored in some species of Perca [8,9], and Amhr2 has been characterized as a potential MSD gene in yellow perch, P. flavescens [9].Our results suggest that this duplication of amhr2 in P. flavescens is also shared by P. schrenkii and S. lucioperca, implying an origin of duplication in their last common ancestor, dated around 19-27 Mya.However, the fate of this duplication seems to be complex-with multiple duplications/insertions on different chromosomes with no clear sex linkage in S. lucioperca, a secondary loss in P. fluviatilis and S. vitreus, contrasting with a single potentially sex-linked duplication/insertion in P. schrenkii and in P. flavescens.This finding suggests that the shared ancestral amhr2b-duplicated locus (Fig. 2A) might be a jumping locus that has been moving around during its evolution as found for the sdY MSD jumping sex locus in salmonids [34,35].Additional evidence that these amhr2b genes originated a single ancestor also rely on the fact that the Amhr2b proteins of P. flavescens [9], P. schrenkii and S. lucioperca share a similar gene structure with an N-terminal truncation that results in the absence of the cysteine-rich extracellular part of the receptor.This part of the receptor is known to be crucial for ligand binding [36] and similar N-terminal truncations of duplicated amhr2 involved in sex determination were also described in catfishes from the Pangasidae family [18] and in the common seadragon [21].A N-terminal truncation has also been observed in a BMP receptor (Bmpr1b) acting as a master sex determining gene in the Atlantic herring, Clupea harengus, where this receptor retains the capacity to propagate specific intracellular signals through kinase activity and Smad protein phosphorylation despite its N-terminal truncation [37].Collectively, these results suggest that certain TGFβ receptors, even when truncated in their N-terminal extracellular ligandbinding domain, can still elicit a biological response independent of ligand activation.The convergent evolution of numerous fish MSD genes encoding a TGFβ receptor with analogous N-terminal truncation suggests the likelihood of this ligand-independent action as a crucial adaptive mechanism.In the genus Sander, the situation might be similar to that in Perca regarding the changes of the sex-determination systems between species.In S. lucioperca, amhr2b might still serve as the MSD gene, but the several recent amhr2b duplications have complicated our analysis so far.Similar to P. fluviatilis, S. vitreus has lost amhr2b and likely another factor took over as a potential MSD gene.

A new sex-specific locus in P. fluviatilis
The fact that P. fluviatilis sex does not rely on an amhr2 duplication like P. flavescens [9] and P. schrenkii does indicates that P. fluviatilis evolved a completely different and new MSD gene.Our results also show that this sex locus on P. fluviatilis Chr18 is very small compared Table 4 KASpar allele-specific PCR assays on seven single sex-specific nucleotide variations (SNV ID#) in P. fluviatilis.Numbers of homozygote (Ho), heterozygote (He), and uncalled genotypes (U).The p-value of association with sex was calculated for each SNV based on Pearson's chi-square test with Yates' continuity correction scoring heterozygote males and homozygote females as positives N total numbers of genotyped individuals (M/F: males/females), %N percentage of genotyped individuals, % As percentage of correctly assigned genotypes, i.e., male heterozygotes and female homozygotes to what is observed in many fish species, with a potentially non-recombining size around 100 kb.This locus however is not the smallest SD locus described in teleost fish: in the pufferfish, Takifugu rubripes, the sex locus is limited to a few SNPs that differentiate amhr2 alleles on the X-and Y-chromosomes [19].Because we did not find any sign of a sex chromosome-specific duplication/insertion event in the P. fluviatilis SDR, this sex locus seems to result from pure allelic diversification and is thus in contrast to P. flavescens [9] and probably also P. schrenkii (this study).The P. fluviatilis sex-specific region on Chr18 contains 6 annotated genes, which encode a protein of unknown function (C18h1orf198), 3 gap junction proteins (Cx32.2,Gja13.2, and Cx32.7), a protein annotated as inactive hydroxysteroid dehydrogenase-like protein (Hsdl1), and a protein known as protein broad-minded or Tcb1 domain family member 32 protein (Tbc1d32).Of these genes, hsdl1 and tcb1d32 are interesting potential MSD candidates in P. fluviatilis, based on their potential functions and the fact that both display a higher testicular expression compared to the ovary.The Hsdl1 protein is indeed annotated as "inactive" [38], but this annotation only refers to its lack of enzymatic activity against substrates so far tested, leaving other potential functional roles for a protein that is highly conserved in vertebrates [38].In Epinephelus coioides, hsdl1 has been shown to be differentially expressed during female-to-male sex reversal, and its expression profile clustered with hsd17b1 [39], which plays a central role in converting sex steroids and has recently been identified as a potential MSD gene in different species with a female heterogametic (ZZ/ZW) sex-determination system, like in oyster pompano, Trachinotus anak [40] and different amberjack species [41,42].The Tbc1d32 protein has been shown to be required for a high Sonic hedgehog (Shh) signaling in the mouse neural tube [43].Given the role of Shh signaling downstream of steroidogenic factor 1 (nr5a1) for the proper steroidogenic lineage fate [44] and the importance of steroids in gonadal sex differentiation [45,46], tbc1d32 would be also an interesting candidate as a potential MSD gene in Perca fluviatilis.

Conclusions
Our study shows that Percidae exhibit a remarkable high variation in sex determination mechanisms.This variation is connected to deletion or amplification of amhr2bY, which if lost in certain species (like Perca fluviatilis or Sander vitreus) should cause rewiring of the sex determining pathways and result in the rise of new sex-determination systems.The mechanisms behind a "jumping" amhr2bY expansion or loss and which genes replace it as the MSD remain to be elucidated.The new Percidae reference sequence assemblies presented here and the highly reliable sex markers developed for P. fluviatilis can now be applied for sex genotyping in basic science as well as in aquaculture.

Biological samples
In Perca fluviatilis, high molecular weight (HMW) genomic DNA (gDNA) for genome sequencing was extracted from a blood sample of a male called "Pf_M1" (BioSample ID SAMN12071746 [47]) from the aquaculture facility of the University de Lorraine, Nancy, France.Blood (0.5 ml) was sampled and directly stored in 25 ml of a TNES-Urea lysis buffer (TNES-Urea, 4 M urea; 10 mM Tris-HCl, pH 7.5; 125 mM NaCl; 10 mM EDTA; 1% SDS).HMW gDNA was extracted from the TNES-urea buffer using a slightly modified phenol/ chloroform protocol as described [12].For the chromosome contact map (Hi-C), 1.5 ml of blood was taken from the same animal and slowly (1 K/min) cryopreserved with 15% dimethyl sulfoxide (DMSO) in a Mr. Frosty Freezing Container (Thermo Fisher) at − 80 °C.Additional fin clip samples for RAD sequencing (RAD-Seq), pool sequencing (Pool-Seq), or sex-genotyping assays were collected and stored in 90% ethanol, either at the Lucas Perche aquaculture facility (Le Moulin de Cany, 57170 Hampont, France), at Kortowskie Lake in Poland, or at Mueggelsee Lake in Germany.Samples of Perca schrenkii were obtained for genome sequencing and sex genotyping from male and female wild catches at lake Alakol, Kazakhstan (46.328 N, 81.374 E, Additional file 9: Fig. S8).Different organs and tissues (brain, liver, muscle, ovary, testis) were sampled for genome and transcriptome sequencing (Biosample ID SAMN15143703, [48]) and stored in RNAlater.HMW gDNA for genome sequencing was extracted from brain tissue of the male P. schrenkii individual, using the MagAttract HMW DNA Kit (Qiagen, Germany).Total RNA for transcriptome sequencing was isolated using a standard TRIzol protocol, in combination with the RNAeasy Mini Kit (Qiagen, Germany).
For genome sequencing of Sander vitreus a fin clip of a male was sampled by Ohio Department of Natural Resources (Ohio, DNR) in spring 2017 and stored in 96% ethanol.The S. vitreus sample called "19-12246" originated from Maumee River, Ohio [41.554N; − 83.6605W].gDNA was extracted using the DNeasy Tissue Kit (Qiagen).Short DNA fragments were removed/reduced by size-selective, magnetic-bead purification using 0.35× of sample volume AMPure beads (Beckmann-Coulter) and two washing steps with 70% ethanol.

Sequencing
Genomic sequencing of P. fluviatilis was carried out using a combination of 2 × 250 bp Illumina short-reads, Oxford Nanopore long reads and a chromosome contact map (Hi-C).For long-read sequencing, DNA was sheared to 20 kb using the megaruptor system (Diagenode).ONT (Oxford nanopore technologies) library preparation and sequencing were performed using 5 µg of sheared DNA and ligation sequencing kits SQK-LSK108 or SQK-LSK109, according to the manufacturer's instructions.The libraries were loaded at a concentration of 0.005 to 0.1 pmol and sequenced for 48 h on 11 GridION R9.4 or R9.4.1 flowcells.Short read wgs (whole-genome shotgun) sequencing for consensus polishing of noisy long read assemblies was carried out by shearing the HMW DNA to approximately 500-bp fragments and using the Illumina Truseq X kit, according to the manufacturer's instructions.The library was sequenced using a read length of 250 bp in paired-end mode (HiSeq 3000, Illumina, California, USA).Hi-C library generation for chromosome assembly was carried out according to a protocol adapted from Foissac et al. [49].The blood sample was spun down, and the cell pellet was resuspended and fixed in 1% formaldehyde.Five million cells were processed for the Hi-C library.After overnight digestion with HindIII (NEB), DNA ends were labeled with Biotin-14-DCTP (Invitrogen), using Klenow fragment (NEB) and re-ligated.A total of 1.4 µg of DNA was sheared to an average size of 550 bp (Covaris).Biotinylated DNA fragments were pulled down using M280 Streptavidin Dynabeads (Invitrogen) and ligated to PE adaptors (Illumina).The Hi-C library was amplified using PE primers (Illumina) with 10 PCR amplification cycles.The library was sequenced using a HiSeq3000 (Illumina, California, USA), generating 150 bp paired-end reads.
Genomic sequencing of P. schrenkii and S. vitreus was carried out using Oxford Nanopore long reads on a Min-ION nanopore sequencer (Oxford Nanopore Technologies, UK) in combination with the MinIT system.Several libraries were constructed using the tagmentation-based SQK-RAD004 kit with varying amounts of input DNA (0.4 to 1.2 µg) from a male individual or using the ligation approach of the SQK-LSK109 kit (input DNA 2 µg).Libraries were sequenced on R9.4.1 flowcells with variable run times and exonuclease washes by the EXP-WSH003 kit to remove pore blocks and improve the data yield.Short-read wgs sequencing of P. schrenkii was conducted at BGI (BGI Genomics Co., Ltd.).A P. schrenkii male and a female wgs library (300 bp fragment length) were constructed and paired end reads of 150 bp length were generated on an Illumina Hiseq4000 system.Public short-read wgs data of S. vitreus were obtained from the NCBI Sequence Read Archive (SRA) using the accession SRR9711286 [50].Transcriptome sequencing of six P. schrenkii samples (female brain, male brain, male muscle, female liver, ovary, and testis) was conducted at BGI. Transcriptome sequencing libraries were constructed from total ribonucleic acid (RNA), applying enrichment of mRNA with oligo(dT) hybridization, mRNA fragmentation, random hexamer cDNA synthesis, size selection, and PCR amplification.Sequencing of 150-bp paired-end reads was performed by an Illumina HiSeq X Ten system.

Genome browsers and data availability
We provide UCSC genome browsers [67] for the five available Perca and Sander reference genomes (this study: P. fluviatilis, P. schrenkii, S. vitreus; earlier studies: P. flavescens [9] and S. lucioperca [22] at http:// genom es.igb-berlin.de/ Perci dae/.These genome browsers provide access to genomic sequences and annotations (either public NCBI GNOMON annotations or annotations resulting from our comparative approach).Blat [68] servers for each genome are available to align nucleotide or protein sequences.

Phylogenomics and divergence time estimation
We performed pair-wise whole-genome alignments of 36 teleost genome assemblies as in [69], using Last-aligner and Last-split [34] for filtering 1-to-1 genome matches and Multiz [35] for multiple alignment construction from pairwise alignments.We removed all annotated coding sequence (CDS) from that multiple alignment to retain non-coding alignments using the tool named mafsInRegion (https:// github.com/ ENCODE-DCC/ kentU tils) and a bed file with CDS coordinates.Subsequently, we calculated the species tree using iqtree2 and raxml-ng [70,71].We added the genomes of P. schrenkii, S. vitreus, and Etheostoma spectabile (GCF_008692095.1 [72]) to this dataset and re-analyzed the highly supported subclade containing Percidae species using several outgroups (Lates, Oreochromis, Pampus, and Thunnus sp.).We estimated the divergence times using a large subset of our multiple alignment (10 6 nt residues) and the approximate method of Mcmctree (Paml package version, [73]).We calibrated 5 nodes of the tree by left or right CI values, obtained from www. timet ree.org and applied independent rates or correlated rates clock models and the HKY85 evolutionary model.Approximately 10 8 samples were calculated, of which we used the top 50% for divergence time estimation.Each calculation was performed in two replicates, which were checked for convergence using linear regression.The final tree was plotted using FigTree (v1.4.4,http:// tree.bio.ed.ac.uk/ softw are/ figtr ee).

Perca fluviatilis RAD sequencing
Perca fluviatilis gDNA samples from 35 males and 35 females were extracted with the NucleoSpin Kit for Tissue (Macherey-Nagel, Duren, Germany), following the manufacturer's instructions.Then, gDNA concentrations were quantified with a Qubit3 fluorometer (Invitrogen, Carlsbad, CA) using a Qubit dsDNA HS Assay Kit (Invitrogen, Carlsbad, CA).RAD libraries were constructed from each individual's gDNA, using a previously described protocol with the single Sbf1 restriction enzyme [74].These libraries were sequenced on an Illumina HiSeq 2500.Raw reads were demultiplexed using the process_radtags.plwrapper script of stacks, version 1.44, with default settings [75], and further analyzed with the RADSex analysis pipeline [76] to identify sexspecific markers.

Perca fluviatilis pool sequencing
Sequencing of pooled samples (Pool-Seq) was carried out in Perca fluviatilis to increase the resolution of RAD sequencing for the identification of sex-specific signatures characteristic of its sex-determining region.The gDNA samples used for RAD sequencing were pooled in equimolar quantities according to their sex.Pooled male and pooled female libraries were constructed using a Truseq nano kit (Illumina, ref.FC-121-4001) following the manufacturer's instructions.Each library was sequenced in an Illumina HiSeq2500 with 2× 250 reads.Pool-Seq reads were analyzed as previously described [9,11,77,78] with the PSASS pipeline (psass version 2.0.0:https:// zenodo.org/ record/ 26159 36#.XtyIS 3s6_ AI) that computes the position and density of SNVs, heterozygous in one sex but homozygous in the other sex (sex-specific SNVs), and the read depths for the male and female pools along the genome to look for sex coverage differences.Psass was run with default parameters except -windowsize, which was set to 5,000, and -output-resolution, which was set to 1000.

PCR-based sex diagnostics
A Perca schrenkii PCR-based sex diagnostic test was designed based on multiple alignments of the different amhr2 genes in P. fluviatilis (one autosomal gene only), P. flavescens (two genes), and Perca schrenkii (two genes) to target a conserved region for all Perca amhr2 genes, allowing the design of PCR-primers that amplify both the autosomal amhr2a and the male-specific amhr2bY with Table 5 KASpar allele-specific PCR primers.For each allele (AL1 and AL2) primers, the X and Y sex chromosome-specific alleles are provided along with their sequences Perca fluviatilis primers were designed to amplify a 27-bp deletion variant in the third intron of the P. fluviatilis hsdl1 gene, which was identified as a male specific (Y-specific) variation based on the pool-seq analysis.Selected PCR primer sequences were forward 5′-ACA CTG ATC AAC ATT TTC TGT CTC A-3′ and reverse 5′-TGT TAA C AT TTG AGA ATT TTG CCT T-3′.PCRs were carried out as described above with the following cycling conditions: denaturation 96 °C for 3 min; 40 cycles of denaturation (96 °C, 30 s), annealing (60 °C, 30 s), and extension (72 °C, 30 min); final extension (72 °C, 5 min); storage at 4 °C.PCR amplicons were separated on 5% agarose gels (5% Biozym sieve 3:1 agarose, 1× TBE buffer, 5 V/cm, 1 h 40 min running time), and the amplicon derived from the amplification of the X-chromosome allele was used as a positive PCR control.In addition to this classical PCR sex-genotyping method, we also explored the sex linkage of some sex-specific SNVs in P. fluviatilis using Kompetitive Allele-Specific Polymerase chain reaction (KASPar) assays [79].Seven sexspecific SNVs were selected at different locations within the P. fluviatilis sex-determining region.Primers (Table 5) were designed using the design service available on the 3CR Bioscience website (www.3crbio.com/ free-assay-design).KASPar genotyping assays were carried out with a single end-point measure on a Q-PCR Light Cycler 480 (Roche) using the Agencourt ® DNAdvance kit (Beckman), following the manufacturer's instructions.

RNAseq expression analyses
Already available gonadal datasets of P. fluviatilis (age 9 month) RNAseq [80] [SRA accessions: SRR14461526 [81] and SRR14461527 [82]] were used to compare gene expression between ovary and testes for the gene models annotated in the SD region.Reads were mapped to our P. fluviatilis reference genome using HISAT2 [83] and transcript assembly and fragments per kilobase million (FPKM) values were calculated using STRINGTIE [84].

Fig. 1
Fig. 1 Time calibrated phylogenomic tree constructed from non-coding alignments of 36 Percomorpha genome assemblies reveals a massive radiation after the Cretaceous-Paleogene (K-Pg) boundary (66 Mya, dotted red line).The family Percidae is highlighted in yellow.Node numbers depict the median ages in Mya calculated by Mcmctree (values in brackets were taken from www. timet ree.org and used for calibration).Blue bars depict the 95% confidence intervals for the node ages.All branches of the tree obtained 100% support using the SH-aLRT (Shimodaira-Hasegawa-like approximate likelihood ratio test) and UFBS (ultra-fast bootstrap) tests

Fig. 4
Fig. 4 Chromosome 18 is the sex chromosome of Perca fluviatilis.A, B Genome-wide Manhattan plots of A male-and B female-specific single-nucleotide variations (SNVs), showing that chromosome 18 (Chr18) contains a 100-kb region, enriched for male-specific SNVs.Male-and female-specific SNVs are represented as dots (total of SNVs per 50-kb window size) of alternating colors on adjacent chromosomes.C Zoomed view of the male-specific SNVs on the sex-biased region of Chr18 with its gene annotation content (D): c18h1orf198 = c18h1orf198 homolog; cx32.2 (LOC120547048) = gap junction Cx32.2 protein-like; gja13.2= gap junction protein alpha 13.2; cx32.7 (LOC120546226) = gap junction Cx32.7 protein-like; hsdl1 = inactive hydroxysteroid dehydrogenase-like protein 1; tbc1d32 = tcb1 domain family member 32 (also known as protein broad-minded).Stars over the ruler of D are the locations of the single male-specific RAD maker (red star) and of the SNVs used for designing KASPar assays (black stars; SNV1-7).The location of the sex-specific intronic indel inside the hsdl1-gene used for sex-specific PCR is shown by a yellow star CAA GTT CAT GCT GAC ACA TAT TGT CCA TCT GAT GTA AAT G SNV1_AL2 Y GAA GGT CGG AGT CAA CGG ATT ATG ACA CAT ATT GTC CAT CTG ATG TAA ATT SNV1_C Common CAC CAC CAC TGA CTG AAG AAT AAT ATGAA SNV2_AL1 X GAA GGT GAC CAA GTT CAT GCT GGA CTG ATT GTG CTG CTT CTC TC SNV2_AL2 Y GAA GGT CGG AGT CAA CGG ATT GGA CTG ATT GTG CTG CTT CTC TT SNV2_C Common CAG ATG AGG AAG GAG GAG ATG CAT SNV3_AL1 X GAA GGT GAC CAA GTT CAT GCT TCA CCA CCA TAG AAC CACC SNV3_AL2 Y GAA GGT CGG AGT CAA CGG ATT CGC TTC ACC ACC ATA GAA CCA CT SNV3_C Common GGG ATG AGA TGC CAT TCT TCC AAA TAATA SNV4_AL1 Y GAA GGT GAC CAA GTT CAT GCT CGC CCT CAG CCT GGT TGAT SNV4_AL2 X GAA GGT CGG AGT CAA CGG ATT CGC CCT CAG CCT GGT TGAG SNV4_C Common TCG TCA TGC ACT CCT TCA CAG CTT T SNV5_AL1 X GAA GGT GAC CAA GTT CAT GCT GGA ATT TGC CTG AAA TAA TGA ATG AAT ATG SNV5_AL2 Y GAA GGT CGG AGT CAA CGG ATT GTG GAA TTT GCC TGA AAT AAT GAA TGA ATATA SNV5_C Common AGG ACA TTA CAG ATT GGT CAG ACC ATATA SNV6_AL1 X GAA GGT GAC CAA GTT CAT GCT TAC CCT TCT CAC CAC CTG TTT SNV6_AL2 Y GAA GGT CGG AGT CAA CGG ATT CTT ACC CTT CTC ACC ACC TGT TG SNV6_C Common CAT AGT TCT TAC CCT TAC TGT ACC AGATA SNV7_AL1 Y GAA GGT GAC CAA GTT CAT GCT CAG TGA ACC TCC CTA TGA GGC A SNV7_AL2 X GAA GGT CGG AGT CAA CGG ATT AGT GAA CCT CCC TAT GAG GCG SNV7_C Common CTC GGT ACA AGG TTG AAA GAT GAA AGATA different and specific PCR-amplicon sizes.Selected PCR primer sequences were forward: 5′-AGT TTA TTG TGT TAG TTT GGGCT-3′ and reverse: 5′-CAA ATA AAT CAG AGC AGC GCATC-3′.PCRs were carried out with 1U Platinum Taq DNA Polymerase and its corresponding Buffer (Thermo Fisher) supplemented with 0.8 mM dNTPs (0.2mM each), 1.5 mM MgCl 2 , and 0.2 µM of each primer with the following cycling conditions, 96 °C for 3 min; 40 cycles of denaturation (96 °C, 30 s), annealing (54 °C, 30 s), and extension (72 °C, 1 min); final extension (72 °C, 5 min); storage at 4 °C.PCR amplicons were separated on 1.5% agarose gels (1.5% std.agarose, 1× TBE buffer, 5 V/cm, running time 40 min) and the systematic amplification of the autosomal (amhr2a) amplicon was used as a positive PCR control.

Table 1
[9,24] assembly statistics for P. fluviatilis and P. schrenkii.For comparison, the table also provides numbers for two recently published Perca sp.assemblies[9,24]Abbreviations: LR long reads, HiC chromosome conformation capture, CG comparative genomics, 10X linked-read sequencing, GLM genetic linkage map

Table 2
BUSCO scoring of annotations of five Percidae genome assemblies (P.fluviatilis, P. schrenkii, S. vitreus from this study).The comparative mapping of high-quality NCBI/GNOMON annotations onto closely related species' genome assemblies is a cost-effective and fast procedure to annotate new genomes Abbreviations: C complete, S single copy, D duplicated, F fragmented, M missing

Table 3
Sex linkage of different sex markers in Perca fluviatilis and Perca schrenkii.Associations between each sex-specific marker and sex phenotypes are provided for both males and females (number of positive individuals/total number of individuals) along with the p-value of association with sex that was calculated for each species and population based on Pearson's chi-square test with Yates' continuity correction